Correlated dynamics in human printing behavior 
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Arrival times of requests to print in a student laboratory were analyzed. Inter-arrival times 
between subsequent requests follow a universal scaling law relating time intervals and the size of the 
request, indicating a scale invariant dynamics with respect to the size. The cumulative distribution 
of file sizes is well-described by a modified power law often seen in non-equilibrium critical systems. 
For each user, waiting times between their individual requests show long range dependence and are 
broadly distributed from seconds to weeks. All results are incompatible with Poisson models, and 
may provide evidence of critical dynamics associated with voluntary thought processes in the brain. 
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Since the early work of Berger and Mandelbrot Q ex- 
amining error clustering in telephone circuits, it has been 
recognized that standard Poisson models may be inade- 
quate to describe electronic information networks. This 
was confirmed, for instance by Leland et al. 0] , who stud- 
ied network traffic and found that packet traces show 
scaling behavior. Observations of scaling behavior raise 
a number of questions about how to model these sys- 
tems, optimize performance, or improve design. Signif- 
icant effects include an increase in response times, re- 
quired buffer sizes, etc. In Ref. the authors show 
how the file size distribution of a web server effects the 
resulting network traffic. Large fluctuations (which are 
inherent in critical systems) in packet traffic or demand 
for resources in computer networks can significantly de- 
grade worst case performance Scaling behavior has 
been found not only in the size distribution of files stored 
in computer systems j^, and the sizes of web server re- 
quests (2| , but also in the physical structure of the inter- 
net and the hyper- link structure of the world-wide web 

IJ So far, no definitive causes have been established 
for the complexity of the modern information network. 
Of course, humans interact when they build the internet, 
make hyper- link connections, and send and receive infor- 
mation. Like traffic jams on roads, internet jams are 
produced by humans who act and react, often in response 
to information originating within the network or outside 
it. Various parts of the information network/user system 
are themselves complex systems, and one of the prob- 
lems in modeling modern information networks is how to 
disentangle these effects. 

One recognizes that psychological experiments have 
demonstrated that correlated dynamics occurs in indi- 
vidual human behavior [ifj, Il2t ll.li , even in sit- 
uations where interactions with other humans are min- 
imal 0, 0, For instance, Ref. describes an 
experiment where subjects had to estimate the duration 
of time intervals from memory. The time series of errors 
in the estimates exhibits a 1// power spectrum, showing 
that the errors are correlated in time. In contrast, the se- 



quence of reaction times to an event showed no long range 
correlation. The authors proposed that long range de- 
pendence is associated with voluntary thought processes 
in the brain |17l] . Similar observations were made for the 
dynamics of moods and psychotic states. For in- 
stance the distribution of time intervals between subse- 
quent hospitalizations for schizophrenia is approximately 
power-law JJ|| . A physical basis for these behaviors may 
be related to scale-free functional networks in the brain, 
which have recently been observed in situ [191] . 

In order to better describe individual human behavior 
in a networked computing environment, we study a sim- 
ple case where the use or demand is primarily associated 
with individual choice rather than with group dynamics. 
The particular quantity we focus on is the inter-arrival 
times between subsequent print requests made by users in 
a computing laboratory for university students. We find 
evidence of long range correlations in the inter-arrival 
times for individual users to send requests, as well as a 
broad distribution of inter-arrival times. The totality of 
print requests from all users reveals a scaling law relat- 
ing inter- arrival times and the sizes of the print request. 
This law indicates that the same (re-scaled) dynamics is 
responsible for requests to print small and large docu- 
ments. This law is similar to that recently observed for 
waiting times between successive earthquakes 
or solar flares [l^l ■ The scaling function for the re-scaled 
inter-arrival times is approximately log-normal. The cu- 
mulative distribution of the sizes of print requests is well- 
described by a modified power law, which is referred 
to as the distribution of superstatistics 0, ll^, or 
the q-exponential of non extensive statistical mechan- 
ics [23, |23- An elementary stochastic process is stud- 
ied that reproduces some, but not all, of the observed 
features. Our results are supportive of the hypothesis 
that the brain operates at or near a self-organized crit- 
ical state li^. It also suggests the possibility of using 
data collected via the modern information network to 
systematically investigate models of human behavior. 

The Department of Computing at Imperial College 
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London maintains a networked printing system for staff 
and students. The student labs offer about 300 com- 
puter work spaces, and are divided into different rooms, 
the largest one accommodating up to 150 students. The 
printers are networked and accessible from any machine 
in the department. A user selects a printer and submits 
her print job to a central server. The server records the 
time a request is submitted with a resolution of one sec- 
ond. It also records the size of the request, the user name 
and the intended printer. This investigation focuses on 
requests sent to the printer, chrome, that is located in the 
largest room. The labs are closed between 23:00 and 7:00, 
but users can print during closure times when logged in 
remotely. The data used here include the entire year of 
2003 and closure times have been included in the anal- 
ysis. Table 1 gives relevant parameters for the data set 
studied, which can be accessed at Eif . 



number of users 


1122 


number of users issuing 
more than three requests 


1001 


number of requests per year 


73853 


mean document size 

mean time between requests 


1.2 Mbytes 
7.1 min 


minimum time resolution 


1.0 sec 



TABLE I: Parameters of the user and printing system in 2003. 

We first analyze the distribution of inter-arrival times 
between subsequent print requests for the entire year. 
Time differences from the logged event times Tf are mea- 
sured as 
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(1) 



The superscript S refers to the size of the print request 
in bytes and indicates that this set of times only includes 
requests that are larger than S. The quantity is the 
number of print requests that are larger than S. Time 
intervals of length zero are neglected from the analysis. 
For each chosen threshold S we estimate Ps{t), which 
is the probability of a certain time interval t between 
subsequent requests of size S or larger. To display this 
distribution we count the number of time differences in 
exponentially growing bins and normalize the count by 
the bin size. Fig. ^ shows that the shape of the wait- 
ing time distribution depends on the size threshold, 5, 
of the documents. This could indicate different dynam- 
ical processes responsible for the small and large docu- 
ments. However, all distributions are broad and show an 
anomaly near one day. The anomaly is related to the 
overnight closure of the labs. 

To determine if a different dynamics is responsible for 
requests of different sizes, we implement a scaling ar- 
gument similar to one recently put forward by Bak et 
al [23 to describe the waiting time statistics of earth- 
quakes. The average time between requests {t)s may 
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FIG. 1: Distribution of inter-arrival between subsequent re- 
quests to the printer "chrome" in 2003. Different curves are 
for different threshold sizes of the requests. 



provide a rescaling factor for the inter-arrival times, so 
that the distributions measured with different size thresh- 
olds, S, collapse onto a single scaling function. Of course, 
{t)s = = n(s)- Here T is the time span of the record 
and R{S) is the rate of requests larger than S. N{> S) 
is the cumulative number of requests larger than size S. 
As shown in the inset of Fig. El N(> S) is well described 
by a modified power law (241 12,4 l2fi l27l | : 



N{> S) 



1 



where S* 



(1 + (5/5*))^-! 

(7.9 ± 0.5) X 10^ and 7 - 1 = 0.76 ± 0.03 



(2) 
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FIG. 2: Universal scaling law for the inter-arrival times be- 
tween requests larger than size S, according to Eq. |3] The 
solid line is a fit of to a log-normal function as described in the 
text. Data from the numerical simulation is also shown. The 
inset displays the cumulative distribution of requests sizes. 



We test the ansatz 

Ps{t) ^ R{S)g{tR{S)) 



(3) 
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where g{x) is a scaling function and x = tR{S) is a scaling 
variable. Fig.|5]sliows the results of rescaling the different 
curves in Fig. ^ by their average rate. We see that the 
scaling ansatz of Eq. |21appears to hold over a wide range, 
about seven orders of magnitude in the scaling variable. 
This indicates that the same scale invariant dynamics 
operates when users send requests of any size. The slight 
deviation from data collapse at short times is due to the 
finite temporal resolution of our data (one second). There 
is an additional deviation due to the diurnal period. The 
scaling function g is close to a log-normal distribution: 
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exp(-M|_Z!!)!) (4) 



with m = -3.41 ± 0.07 and a = 2.16 ± 0.04, as also 
shown in Fig. |21 This feature is also found in numerical 
simulations of a stochastic process described later. 

The inter-arrival times for all users do not necessarily 
give a good estimate for the times that pass between 
subsequent requests issued by a single user. To this end 
we study the inter-arrival times tf for each user u printing 
more than three documents over the one year period. In 
the discussion below we set the threshold S = 0. 
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7™, where < i < N" 



(5) 



Each user's list of inter-arrival times is concatenated to 
determine the probability Pind(i) of single user inter- 
arrival times, shown in Fig. |31 This distribution is 
approximately a power law over several decades rang- 
ing from one minute to about a day, with an exponent 
a ~ 1.3. We also analyze the inter-arrival times for 
the busiest single user, which is similar. For compari- 
son we show in Fig. |3| an exponential distribution for a 
Poisson event process that has the same average rate, 
A = 3.4 X 10~^/sec, as the process of the busiest single 
user. A critical system with a power-law distribution of 
intervals is a more accurate description of the data than 
a Poisson model of print requests. 

To decide if intcr-arrival times are correlated, we mea- 
sured the auto correlation function of waiting times for 
single users. The autocorrelation a„(T) at lag step r is 
defined as 
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(6) 



where sf = tf — ^ ^f=i *i ■ inter- arrival times 

are uncorrelated and independent, the arrival process of 
individual requests to print can be modeled as a fractal 
renewal process js^, |^ ■ Analyzing data separately for 
the three most busy users, we find that the auto cor- 
relation function decays as with S ~ 0.6. When 
the order of the inter-arrival times for an individual user 
are shuffled randomly this power law disappears, and the 
waiting times become uncorrelated, with a„(r) indepen- 
dent of r for T > 1. The sequence of inter-arrival times 
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FIG. 3: Single user inter-arrival time distribution, averaged 
over all users and for the single busiest user. The solid, 
straight line indicates a power law distribution, Pind(i) ~ 
with a — 1.3. For comparison, an exponential distribution 
with the same rate as the busiest user is shown as a dashed 



for individual users are correlated over the entire time 
span of our data set. 

Our data shows that models of criticality are relevant 
for describing individual human behavior in the mod- 
ern information network. Lacking, at present, a micro- 
scopic dynamical model, we compare our observations 
with results from a simple stochastic process. Consider 
N arrival streams of print requests. In each stream, time 
intervals between subsequent requests are independent 
random variables chosen from a truncated Pareto distri- 
bution. We neglect correlations between intervals. All 
intervals have the same probability distribution 



where 1 < a < x < b 



(7) 



where a and b are the points where the Pareto distri- 
bution is truncated and C is a normalization constant. 
We choose the parameter k = 0.3 motivated by the re- 
sults in Fig. 13 The short time cut-off a = 2.5 sec 
is set to reflect the fact that in some application users 
must wait before a subsequent print job can be sent off. 
Most students leave after at most 8 years, so 5 = 8 years 
appears to be a reasonable choice. Generating approxi- 
mately 73,000 requests in a year fixes the number of users 
close to TV = 1000. 

At the start of the numerical simulation we schedule an 
arrival event for each stream according to Eq. [71 Upon 
each arrival, the next arrival time is scheduled using the 
same distribution. The system takes about 5 years with 
the above parameters to reach a statistically stationary 
state. As shown in Fig. 2, the inter arrival times mea- 
sured in the simulation compare fairly well with the real 
data. However the real data has significantly larger vari- 
ance. 



4 



We also examined the time series defined by the num- 
ber of print requests in each second. We calculated the 
power spectrum S{f) of this time series and find 1//" 
behavior, as shown in Fig. 4. The exponent a observed 
in the numerical simulation is fixed by the value of k in 
Eq. [7| and is a = 0.3 I^H^I- The real data show instead 
a larger value a ~ 0.5, which indicates, just as the auto- 
correlation function a„(T), that the real arrival process is 
more complicated than a fractal renewal process. A more 
accurate model of individual user behavior in a comput- 
ing network may be that of Davidsen and Schuster [33j . 
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FIG. 4: Power spectrum of the time series defined by the 
print requests per second based on the real arrival data and 
the simulated arrivals in the fifth year. The solid line is a fit 
for the real data, the dashed one for the simulation results, 
see text. 
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